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ABSTRACT 

This paper proposes a method of handling limited 
problem,: in dialect research. In approaching the problem, it was 
necessary to devise a system for coding phonetic transcription which 
would take into account the variance in the diacritics of different 
field workers so that none of the material would be lost while 
permitting computer analysis. The design of the program also allows 
the researcher to isolate the significant variables found in the 
dialects examined. The author presents the coding system, the program 
organization and deck assembly instructions, a listing of the program 
and all the subroutines, and the informant coding. An accompanying 
computer print-out is available for inspection at the ERIC 
Clearinghouse for Linguistics, 1717 Massachusetts Avenue, N.W., 
Washington, D.C. 20036. Copies of the print-out are also available 
from the author at the Illinois Institute of Technology, Chicago, 
Illinois 60616. (Author/DO) 
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A METHOD FOR AUTOMATING DIALECT ANALYSIS 



Frances Land Uskup 



Dialect study has long been plagued with cumbersome 
analytical techniques. Over the past years. Linguistic 
Atlas data has slowly accumulated in researchers 1 file 
cabinets while attempts to analyze the great bulk of the 
material have languished for lack of appropriate techniques. 
Automation of data processing in dialect research, as in other 
fields, would seem to be the obvious solution, but there 
has been a general reluctance to implement this type of 
innovation except in limited cases. 

The main reason being the fear often expressed, that much 
of the fine phonetic detail would be lost in coding phonetic 
transcription. Another problem is that the machine print-out 
of data is a mixture of alfa-numeric characters when a coding 
system is used, which renders it useless for publication. 

It is not possible with the equipment available to most 
researchers to produce a list manuscript of items, except in 
a coded form. There is no company which offers a type-ribbon 
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or 



which contains IPA or Linguistic Atlas symbol notation, 
for that matter, any combination of symbols which could be 
modified for such use. This "hardware problem” could be 
solved by a sizable research grant which would permit such 

equipment to be produced and installed. 

This paper proposes a method of handling limited problems 
in dialect research. The method used will be presented as 
follows : 

(1) Coding system 

( 2 ) Program organization and deck assembly 
instructions 

( 3 ) Listing of the program and all the 
subroutines 

(1+) Informant coding 

In approaching the problem, it has been necessary to 
devise a system for coding phonetic transcription which 
would take into account the variance in the diacritics of 
different field workers so that none of the material would 
be lost while permitting computer analysis. Also, the 
design of a general sort program which would allow the 
researcher to isolate the significant variables found in 
the dialects examined has been accounted for. 

The program is extremely versatile since it allows for 
extensive manipulation and sorting of the data. Any corpus 
which can be coded by the outlined system can be analyzed by 
this program. It would permit any researcher doing 
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dialect studies to analyze cheaply and quickly, large 
quanities of phonetic data. The coding system was designed 
for coding phonetic English transcription, but can be 
easily modified for phonemic analysis or for another language. 







MACHINE CODE 










Vowels i 


U01 




? 


U12 




X 


U23 




I 


U02 




9 


U13 




0 


U2U 




e 


U03 




er 


Ull* 




A 


U25 




e 


UOU 




3 


U15 




a 


U26 




ae 


U05 




& 


Ul6 




T> 


U27 




a 


U06 




a 


U17 




0 


U28 




7 


UOT 




9 


U18 




u 


U29 




Y 


U08 




vt 


U19 




TJ 


U30 






U09 




u 


U20 




0 


U31 




i 


U10 




0 


U21 












Ull 




Til 


U22 










Consonants 


b 


B 




1, 


LX 




V 

s 


sx 




B 


BX 




1 


LL 




t 


T 




(3 


BB 




m 


M 




0 


TT 




w 

c 


C 




n 


N 




V 


V 




d 


D 




y 


NX 




z 


Z 




8 


DD 




P 


P 




V 

z 


ZX 




f 


F 




$ 


PP 




w 


W 




g 


G 




9 


Q 




y 


Y 




h 


H 




r 


R 




X 


XX 




*4 

j 


J 




V 


R$ 




• 

J 


JJ 




k 


K 




B 


RX 










1 


L 




s 


S 









NOTE: Additional symbols can be added at any time to the 

system by combining any of the above symbols, or in 
the case of vowels, by adding additional numbers. 
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Vowel Modification V A 


raised 


101 


V v 


lowered 


102 


V< 


fronted 


103 


V> 


backed 


104 


\ 


nasali zed 


105 


V 

1/ 


weakly nasalized 


106 


V 


rounded unrounded 


107 


r 

V 


slightly rounded 


108 


V.' 


length 


109 


V: 


extra length 


110 


v t 


laryngeali zed 


111 


v * 


pharyngealized 


112 


¥ 


breathy 


113 


V v 


glide 

example: a u 


Il4 

U17H4U29 



Consonant Modification (jJ 


coarticulation 


?+ 


<? 


retroflex 


$ 


C 

o 


syllabic 


A 


c 1 


unreleased 


+ 


2 


devoiced 


5 


c< 


fronted 


# 


c 


voiced devoiced 


s 


c 


voiceless voiced 


t 


& 


lenis 


8/5 


c> 


backed 


% 


Cw 


rounded 


# 


c' 


aspiration 


@ 


c 


dentalized 


1/4 



PROGRAM ORGANIZATION 



Shown below is a simplified view of the overall 
program structure. 



L. 



VOWEL _AFTEJR 
OUTPUT 




The variables are referred to as "Key Letter ( s ) "'-.or the 
'"Sorting Base". When a valid Key Letter has been found 



in the linguistic word, either the entire word is listed, or 
the vowel (+ modification) that is present, before or after 
the key letter, is listed. 
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SUBROUTINES 

The program has four subroutines which can be used in 
various ways* separately or in combination , depending Oi* 
the needs of the researcher. The data can simply be listed 
for storage ( 2 ) or checked for errors in key— punching (x). 
Subroutines ( 3 ) and (4) make it possible to search the data 
and have the environment surrounding certain features one 
wishes to isolate listed along with item number and informant 
number. For example, one can list the vowels (+ modifications ) 
occurring before or after the variable to be analyzed. The 
subroutines will be listed below with a brief explanation . of 
their function: 

(1) Data Verification This task is performed in the 
subroutine VRFY. This subroutine checks a data 
card for the language code and informant number 
(for a one-informant group) to verify that it 
belongs with the language/dialect gro\ip of the 
informant set under analysis. Any erroneous 

data will be noted as such and listed, but it will 
not be processed in the analysis section of the 
program . 

(2) List Word The subroutine LSTWD lists the pertinent 
information on the linguistic word that contains 
the key letter. The output can be from the printer 
or card punch, or both. The printer will list 

the page, the item number and the full linguistic 
word. The card punch duplicates the input card 
(language/dialect code, informant number, page 
number, item number and linguistic word). 

( 3 ) Vowel Before The subroutine V0WB, searches for 
a vowel or a vowel plus modification before the 
key letter (s) based upon an initial parameter 
selection. If a vowel is present, the output will 
list the page, item number and vowel (plus modification 
The absence of vowel in that position as well as lack 
of modification will be noted on the output. 
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(1*) Vovel After The subroutine, V0WA, searches 

for a vowel or a vowel plus modification after 
the key letter(s) based on the initial parameters 
selected. If a vowel occurs, the output will 
list the page number, item number and the vowel 
plus modification. The absence of vowel plus 
modification will be listed on the output. 



PROGRAM INPUT 

The program input consists of the source program and 
the following: 



Control Card 
Parameter Card 
Data Identification Card 
Linguistic Word Data Cards 
Key Letter Data Cards 

(l) Control Card Only one card is required per 
computer run. 

Col. 1-2 Scratch file unit number, IUNIT. This 

variable designates which disk or tape 
unit is to be used for temporary storage. 

NOTE: This number is assigned by 

the computer center. 

Col. 3-4 An arbitrary constant, CDAT . This is 

used to indicate that another data set 
is to be processed. The value of CDAT 
can be any two digit number from 30 to 50. 

An input format of 212 is used for the Control 
Card. 

Sample : 



(2) Parameter Card At least one card is required 
per computer run. 

Qol. 1 Parameter Continuation Variable, DFPAR. 

The value of this variable dictates 
whether or not the input section of 
the main program will be used. 
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Col. 2 



Col. 3 



Col. k 



Col. 5* 



A blank or zero - input new linguistic 

word data. 

A 1 thru 9 - bypass input section and 

go directly to analysis 
section; use previous data. 

Designation of the type of data to be 
processed, NAL. 

A 1 designates a single informant. 

A 2 designates a group analysis. 

NEXT designates which subroutine is to 
be called after the Key Letter has 
been found in the linguistic word. 

A 1 calls subroutine LSTWD. 

A 2 calls subroutine VOWB. 

A 3 calls subroutine VOWA. 

IVB designates whether the vowel or 
the vowel plus modification will be 
searched for in the subroutines VOWB 
and VOWA. 

A 1 designates vowel + modification. 

A 2 designates vowel only. 

NVR designates whether or not the data 
verification subroutine is called. 



A 1 calls subroutine VRFY. 

A 2 bypasses subroutine VRFY. 

Col. 6 NPNCH designates the type of output 

for subroutines LSTWD. 

A 1 designates printer only. 

A 2 designates card punch only. 

A 3 designates printer and card punch. 

Col. 7-12 Must be blanks for proper execution of 

the program. 



An input formant of 6ll, 
Parameter Card. 

Sample : 



Ah, 12 is used for the 
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(3) 

Col. 

Col. 

Col. 

Col. 

Col. 

Col. 

Col. 

Col. 



(fc) 

Col. 

Col. 

Col. 

Col. 



Data Identification Card One data identification 
card must precede the Linguistic Word Data Cards. 

For Group Data: 

1-2 Language code. 

3-6 Group number. 

For Single Informant Data: 

1-2 Language code. 

3-6 Informant number. 

7 Sex of informant. 

8-9 Age of informant. 

10-11 Birthplace of informant. 

12-13 Occupation of informant. 

An input formant of A2 9 A^,I1 9 3I2 is used for the 
Data Identification Card. 

Sample : 



YE0025 



(group data) 



YE01001621809 

(single informant) 



Linguistic Word Data Cards One data card is 
used for each linguistic word. 

1-2 Language code. 

3-6 Informant number. 

7-9 Blanks. 

10-12 Page number from test sheet. 

Item number on test page. (refers to 
the word being phrased). 



Col. 13-15 



r 



- 10 - 



Col. l6-80 Linguistic coded word. NOTE: the 

maximum allowable length for the 
j linguistic word is 65 characters. 

I 

| An input format of A2 ,Al+ ,T10 , A3 ,A3 ,65A1 is used 

I for Linguistic Word Data Cards. 

Sample: 

YE0100‘J6##001002FFR$U28l02ST 

| 

(a |i denotes a blank) 

I 

(5) Key Letter Data Cards One data card is used 
for each key letter. 

Col. 1-2 Number of characters in the Key Letter. 

Col. 3-52 Key letter(s). NOTE: the maximum 

j allowable length for the key letter (s) 

is 50 characters. 

An input format of I2,50A1 is used for the 
Key Letter Data Cards. 

Sample : 

02R$ 



SAMPLE DATA DECK 

Single set of data: 

Col. 1 

I 

05 ^ 

013110 

YE01001621809 

YE0100lW001001FFR$Ull+ 

YE01001W0010 02BU21I04RR 



(tflank card) 

01R 

02RR 



(tflank card) 



Control Card 
Parameter Card 
Data ID Card 

Linguistic Word Data Cards 

ft 

It 

ft 

It 

indicates end of data 
Key Letter Data Cards 

tt 

tt 

tt 

it 

indicates end of data 



ERIC 



(See figure A) 
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SAMPLE DATA DECK 

Multiple sets of data: 
Col. 1 



f 

Q5kk 

013110 

YE01001621809 
YE01001W001001FU16RR. . . 
YE0100tW001002TTUll*I02. . . . 

• 


Control Card 
Parameter Card 
Data ID Card 

Linguistic Word Data Cards 


• 

• 

(#lank card) 

01R 

02RR 

• 


indicates end of data 
Key Letter Data Cards 


• 

kb 

112110 

01R 

02RR 


CDAT var iable-indicates more data 
nev Parameter Card 
Key Letter Data Cards 



kb 

012110 

YE01011551^03 
YE01011W001001THU23. . . 
YE01011W001002GXU06I14. . . 

• 


CDAT variable 

nev Parameter Card 

nev Data ID Card 

nev Linguistic Word Data 


• 

• 

(tflank card) 

02TH 

02RX 


Key Letter Data Card 



(tflank card) 


indicates end of data 


(See figure B) 
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FIGURE B 





Note : 1. A "blank” card always follows the set 

of Linguistic Word data cards. 

2. A "CDAT" variable card, as illustrated 
above, always follows the Key Letter 
data cards if another set of data is to 
be processed, 

3. A "blank" card always follows Key Letter 
data cards if no more data is to be 
processed. 



